Skip to content

[Bug] Sites with hard challenges of CloudFlare do not work with browser.newPage() and call browser.pages() Disrupted them tabs! #832

Open
@NabiKAZ

Description

@NabiKAZ

The site ‍‍‍‍https://www.000webhost.com/cpanel-login has a hard challenge for Cloudflare and does not open normal. So I used the puppeteer-extra-plugin-stealth plugin.

This site will not be opened with the browser.newPage(). (The tick we hit again the same challenge page)
But in the first default tab, which is always open, this site opens! (We tick and the site opens.)
This is strange so far, but it gets strange.

So I tried to use the same tab without newPage().
I tried to get the pages first:
var pages = await browser.pages();
Then open the site in the first tab with pages[0].goto.
But this time the site didn't open! (I mean fails to dissolve challenge Cloudflare)

It looks like it doesn't open when I call ‍newPage().
And also when the pages() method is called, all tabs for this site are disrupted. (Even the first tab that can normally open this site)

I was confused and I think there's a bug here.

Sample code:

import puppeteerExtra from 'puppeteer-extra';
import StealthPlugin from 'puppeteer-extra-plugin-stealth';

var puppeteer = puppeteerExtra.use(StealthPlugin());

var browser = await puppeteer.launch({ headless: false });
// const page = await browser.newPage();

var pages = await browser.pages();
pages[0].goto('https://www.000webhost.com/cpanel-login');

Versions:

node v19.7.0
puppeteer@21.1.1
puppeteer-extra@3.3.6
puppeteer-extra-plugin-stealth@2.11.2
Chrome Version 116.0.5845.141

Video:

Code_gxetp3Hf63.mp4

Activity

mowatermelon

mowatermelon commented on Sep 4, 2023

@mowatermelon

You can give it a try.
Await puppeteer.launch ({userDataDir: path.join (os.homedir (), '.aaa-data'),} add a unified cache folder, then all tab pages that jump to the same address can share a cache.

NabiKAZ

NabiKAZ commented on Sep 5, 2023

@NabiKAZ
Author

@mowatermelon
Thanks for your answer.

It wasn't bad as a temporary trick. But it has problems and of course the bug still exists.

By setting userDataDir, we can maintain the previous status and after solving the challenge manually, the next time Chrome opens, we can open the site without the challenge page.

But when the CloudFlare cookie session expires, everything goes back to the way it was before. If we use, for example, newPage(), the challenge will not be solved. Or if we reach the first tab with the help of pages()[0], everything is broken there and the site's challenge is not solved in any way.

Unless we repeat the trick again, i.e. temporarily remove the call to pages() and run Chrome once to pass the challenge. Then return that function to our code.

In general, it did not cure much pain!

NoeelGz

NoeelGz commented on Sep 11, 2023

@NoeelGz

same problem :(

wlc108

wlc108 commented on Sep 16, 2023

@wlc108

I'm experiencing the same thing on other sites. What seems to be happening from my analysis, is the initial tab is "untouched" by puppeteer. So I can do whatever I want in the initial tab and I'm not detected. If I have puppeteer open it's own tab, and I manually take all actions in that tab, then I get detected as a bot. Alternatively, if I have puppeteer make the initial tab active then I take manual action, I'm detected as a bot.

So it seems any tab that gets "touched" by Puppeteer, gets detected somehow. When I look at Browser Fingerprinting, this seems to be the case. I'm not sure what they're using to detect Puppeteer even with stealth enabled.

NodePuppeteer

NodePuppeteer commented on Sep 26, 2023

@NodePuppeteer

creepjs (https://abrahamjuliot.github.io/creepjs/) is detecting puppeteer when you use any tab besides the start-up tab.

image

Take note that 205 lies are being detected, here's a sample:
image

I'm launching with the latest version of Google Chrome(Version 117.0.5938.92 (Official Build) (64-bit)) on Windows10 and am using the latest Puppeteer Node.js version alongside the stealth plugin with all evasions active.

Another creepjs image:
image

joeledwardson

joeledwardson commented on Sep 30, 2023

@joeledwardson

Any ideas about how Cloudflare is detecting Puppeteer?

I don't know the ins and outs of Puppeteer but I know they use the chrome dev tools protocol, the same as chrome dev tools.

However if I open https://nowsecure.nl/ with chrome dev tools open I can get through fine?

NabiKAZ

NabiKAZ commented on Oct 1, 2023

@NabiKAZ
Author

Unfortunately, this problem was very serious and acute.
And I had to use the service FlareSolverr. This is a proxy to bypass cloudflare and they use selenium.
I just send my first page to it and return only the cf_clearance cookie, set it to my puppeteer and continue...

wlc108

wlc108 commented on Oct 2, 2023

@wlc108

Puppeteer can be detected by https://abrahamjuliot.github.io/creepjs/ if that helps. It's not just that it detects a bot, it detects PUPPETEER.
pup

ergcode

ergcode commented on Oct 4, 2023

@ergcode

Unfortunately, this problem was very serious and acute. And I had to use the service FlareSolverr. This is a proxy to bypass cloudflare and they use selenium. I just send my first page to it and return only the cf_clearance cookie, set it to my puppeteer and continue...

For now this is a solution with no alternatives. Puppeteer or playwright + any anti-detection methods cannot solve problems with cloudflare.

I'll make a clarification.
If your ip or proxy is not on the list of suspicious ones, then there is an option to get challenge v1, which can be completed without a click and with the launch command await puppeteer.launch({ targetFilter: (target) => !!target.url() });
But if your ip is suspicious, then targetFilter: (target) => !!target.url() blocks work with the iframe and the possibility of manipulating the challenge checkbox disappears.

joeledwardson

joeledwardson commented on Oct 7, 2023

@joeledwardson

How is it that FlareSolverr works? Surely if Puppeteer is detected by Cloudflare then so would Selenium

ergcode

ergcode commented on Oct 7, 2023

@ergcode

How is it that FlareSolverr works? Surely if Puppeteer is detected by Cloudflare then so would Selenium

FlareSolverr uses undetected-chromedriver. UC uses completely different detection bypass methods, which are still difficult to replicate in puppeteer and puppeteer-extra.

mdervisaygan

mdervisaygan commented on Oct 13, 2023

@mdervisaygan

Hello,
https://www.npmjs.com/package/cloudflare-scraper
With this bookshelf you can scrape and get cookies.

39 remaining items

vladtreny

vladtreny commented on Jun 15, 2024

@vladtreny

Remove --single-process arg, cloudflare detects it.

AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

I tried it, it didn't work. I used the current arguments on puppeteer version 9.1.1. and they worked then.
Thank you for your help. There are a lot of things to go through, I will start by downgrading my current version of puppeteer to 5.5.0 as previously recommended by #832 (comment).

vladtreny

vladtreny commented on Jun 15, 2024

@vladtreny

For the experiment, try to upgrade it to the latest version (22) and test.

Cloudflare detects --single-process in iframe. Remove it

The IP you provided has a fraud score 100% and bot status true. Maybe they do not like the IP or the DC (pq hosting). But I think chances are low.

{
"success": true,
"message": "Success",
"fraud_score": 100,
"country_code": "KZ",
"region": "Almaty",
"city": "Almaty",
"ISP": "Stark Industries Solutions",
"ASN": 44477,
"organization": "Stark Industries Solutions",
"is_crawler": false,
"timezone": "Asia/Almaty",
"mobile": false,
"host": "kzfull.privateip.net",
"proxy": true,
"vpn": true,
"tor": false,
"active_vpn": false,
"active_tor": false,
"recent_abuse": true,
"bot_status": true,
"zip_code": "N/A",
"latitude": 43.25999832,
"longitude": 76.93000031,
"request_id": "OHwbkYSt9R"
}
AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

I have a method that partially solves my problem. It involves running two browsers: one that I use locally to pass the Cloudflare check, and a second identical browser (with the same version) running remotely. This remote browser is proxied and uses my user data that I pass to it. This method works, but it is extremely inconvenient.

A browser initiated with Puppeteer always triggers an endless Cloudflare verification loop, whereas a manually launched browser does not have this issue. Clearly, Cloudflare detects the use of Puppeteer. Out of curiosity, I compared all properties of the window object in a headless browser and a regular browser. I did not notice any significant differences except for one that serves as an indicator of Puppeteer usage for websites. This is the document.createElement function. For some reason, Puppeteer uses a proxy version of this function instead of the standard one, which can be easily detected with the following code:

function isPuppeteer(){
return Object.getOwnPropertyNames(document).includes('createElement')
}

This needs to be fixed to make Puppeteer less detectable. Please help me reach out to the developers and comment on the discussion I opened: #puppeteer/puppeteer#12589, so this problem does not go unnoticed.

vladtreny

vladtreny commented on Jun 15, 2024

@vladtreny

I get false in your example. Puppeteer 22.

You either did not use stealth.enabledEvasions.delete('iframe.contentWindow') or add additional command line args.

vladtreny

vladtreny commented on Jun 15, 2024

@vladtreny

This issue has nothing to do with puppeteer. This lib (puppeteer-extra) sets proxy on createElement.
But only in headless mode.

Can you package all your files including package.json and upload here to reproduce?

AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

I used these stealthPlugin methods, I also disabled all browser startup arguments.
The main condition is that the browser should not be connected to any user data (userDataDir). If you don't have a problem with Cloudflare, then maybe the site is picking up your pre-recorded user data.

The same browser works differently depending on whether I manually launched it or it was launched via puppeteer. This is most likely a puppeteer issue

AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

If it is as you wrote, then I am wrong and need to remove puppeteer-extra because it leaves traces

vladtreny

vladtreny commented on Jun 15, 2024

@vladtreny

Yes, it is true. puppeteer-extra is bad idea now. I do not use it for years.
But still possible to use. You probably do something wrong.
Need to see the full source code, with all files, like package.js

AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

I can't forward the project. But the code in the project is very simple: I open a page with cloudflare protection and I can't go through it manually.
Disabling puppeteer-extra didn't help either

vladtreny

vladtreny commented on Jun 15, 2024

@vladtreny

You have a bug somewhere.
I suspect in command line args.
Not possible to suggest without ability to reproduce.
Everything you sent works.

AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

"puppeteer": "^9.1.1", "puppeteer-core": "^21.5.2", "puppeteer-extra": "^3.3.6", "puppeteer-extra-plugin-stealth": "^2.11.2",
I can't update the puppeteer, the new version has a bug that is not compatible with my project
puppeteer/puppeteer#12588

AntonPolyakin

AntonPolyakin commented on Jun 15, 2024

@AntonPolyakin

@vladtreny thanks for the help, you were right the --single-process argument was the problem. When I turned it off I didn't feel any difference at all, the looping continued. Now I tested again, this time I waited a bit longer and after a few iterations/cycles the captcha was passed. Now I can sleep well.

klimsava

klimsava commented on Oct 16, 2024

@klimsava

Remove --single-process arg, cloudflare detects it.

Please share what other arguments are undesirable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

      Participants

      @NabiKAZ@nhhoang@phapntm@wlc108@vladtreny

      Issue actions

        [Bug] Sites with hard challenges of CloudFlare do not work with `browser.newPage()` and call `browser.pages()` Disrupted them tabs! · Issue #832 · berstend/puppeteer-extra